4 research outputs found

    Online Influence Maximization (Extended Version)

    Full text link
    Social networks are commonly used for marketing purposes. For example, free samples of a product can be given to a few influential social network users (or "seed nodes"), with the hope that they will convince their friends to buy it. One way to formalize marketers' objective is through influence maximization (or IM), whose goal is to find the best seed nodes to activate under a fixed budget, so that the number of people who get influenced in the end is maximized. Recent solutions to IM rely on the influence probability that a user influences another one. However, this probability information may be unavailable or incomplete. In this paper, we study IM in the absence of complete information on influence probability. We call this problem Online Influence Maximization (OIM) since we learn influence probabilities at the same time we run influence campaigns. To solve OIM, we propose a multiple-trial approach, where (1) some seed nodes are selected based on existing influence information; (2) an influence campaign is started with these seed nodes; and (3) users' feedback is used to update influence information. We adopt the Explore-Exploit strategy, which can select seed nodes using either the current influence probability estimation (exploit), or the confidence bound on the estimation (explore). Any existing IM algorithm can be used in this framework. We also develop an incremental algorithm that can significantly reduce the overhead of handling users' feedback information. Our experiments show that our solution is more effective than traditional IM methods on the partial information.Comment: 13 pages. To appear in KDD 2015. Extended versio

    Cleaning algorithms for novel applications

    No full text
    The information managed in emerging applications, such as location-based service, sensor network, and crowdsourcing system, is usually imperfect. In many situations, data can be cleaned (e.g., removed or reduced) by performing appropriate operations. In this thesis, we study the cleaning problem under limited resources for two novel applications: querying probabilistic data, and collecting data from human intelligence tasks in crowdsourcing environments. Probabilistic databases have been developed to handle uncertain data recently. For example, the temperature readings in a sensor network may be uncertain due to the lack of latest readings from sensors at every moment. A probabilistic database is able to capture the real value distributions of the readings, and enables evaluation of probabilistic queries on the data. However, data uncertainty may lead to ambiguous query results. By performing cleaning operations on the data, for example, probing some sensors for their latest readings, the ambiguity in query results can be reduced. In this thesis, we first study how to quantify the ambiguity of query results returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database in order to improve top-k query quality. Specifically, we consider the facts that cleaning may involve a cost and fail. We propose optimal cleaning algorithms as well as several heuristics to select the data to clean under a limited budget. In a crowdsourcing system, Human Intelligence Tasks (HITs) (e.g., translating sentences, matching photos, tagging videos with keywords) can be conveniently specified to collect data. HITs are made available to a large pool of workers, who are paid upon completing the HITs they have selected. Since workers may be casual Internet users, their answers are hardly perfect. If more workers are employed to perform a HIT, the quality of the HIT’s answer could be statistically improved. Hence, assigning the number of workers (or plurality) of each HIT is an effective way to reduce (or clean) the imperfectness of the collected data (i.e., HITs answers). In this thesis, we address the important problem of determining the plurality of each HIT so that the overall answer quality is optimized. We propose a dynamic programming (DP) algorithm for solving the plurality assignment problem (PAP). We identify two interesting properties, namely, monotonicity and diminishing return, which are satisfied by a HIT if the quality of the HIT’s answer increases monotonically at a decreasing rate with its plurality. We show for HITs that satisfy the two properties (e.g., multiple-choice-question HITs), the PAP is approximable. We propose an efficient greedy algorithm for such case.published_or_final_versionComputer ScienceDoctoralDoctor of Philosoph

    On incentive-based tagging

    No full text
    Abstract—A social tagging system, such as del.icio.us and Flickr, allows users to annotate resources (e.g., web pages and photos) with text descriptions called tags. Tags have proven to be invaluable information for searching, mining, and recommending resources. In practice, however, not all resources receive the same attention from users. As a result, while some highly-popular resources are over-tagged, most of the resources are under-tagged. Incomplete tagging on resources severely affects the effectiveness of all tag-based techniques and applications. We address an interesting question: if users are paid to tag specific resources, how can we allocate incentives to resources in a crowd-sourcing environment so as to maximize the tagging quality of resources? We address this question by observing that the tagging quality of a resource becomes stable after it has been tagged a sufficient number of times. We formalize the concepts of tagging quality (TQ) and tagging stability (TS) in measuring the quality of a resource’s tag description. We propose a theoretically optimal algorithm given a fixed “budget ” (i.e., the amount of money paid for tagging resources). This solution decides the amount of rewards that should be invested on each resource in order to maximize tagging stability. We further propose a few simple, practical, and efficient incentive allocation strategies. On a dataset from del.icio.us, our best strategy provides resources with a close-to-optimal gain in tagging stability. I

    Impaired Spatial Firing Representations of Neurons in the Medial Entorhinal Cortex of the Epileptic Rat Using Microelectrode Arrays

    No full text
    Epilepsy severely impairs the cognitive behavior of patients. It remains unclear whether epilepsy-induced cognitive impairment is associated with neuronal activities in the medial entorhinal cortex (MEC), a region known for its involvement in spatial cognition. To explore this neural mechanism, we recorded the spikes and local field potentials from MEC neurons in lithium–pilocarpine-induced epileptic rats using self-designed microelectrode arrays. Through the open field test, we identified spatial cells exhibiting spatially selective firing properties and assessed their spatial representations in relation to the progression of epilepsy. Meanwhile, we analyzed theta oscillations and theta modulation in both excitatory and inhibitory neurons. Furthermore, we used a novel object recognition test to evaluate changes in spatial cognitive ability of epileptic rats. After the epilepsy modeling, the spatial tuning of various types of spatial cells had suffered a rapid and pronounced damage during the latent period (1 to 5 d). Subsequently, the firing characteristics and theta oscillations were impaired. In the chronic period (>10 d), the performance in the novel object experiment deteriorated. In conclusion, our study demonstrates the detrimental effect on spatial representations and electrophysiological properties of MEC neurons in the epileptic latency, suggesting the potential use of these changes as a “functional biomarker” for predicting cognitive impairment caused by epilepsy
    corecore